taDTTfc3©  DOU 

COMPUTATIONAL  NEUROSCIENCE 

A  general  role  for  medial  prefrontal  cortex  in  event 
prediction 

William  H.  Alexander 7  2 *  and  Joshua  W.  Brown2 

1  Department  of  Experimental  Psychology,  Ghent  University,  Gent,  Belgium 

2  Department  of  Psychological  and  Brain  Sciences,  Indiana  University,  Bloomington,  Bloomington,  IN,  USA 


ORIGINAL  RESEARCH  ARTICLE 

published:  11  July  2014 
doi:  10.3389/fncom.201 4.00069 


Edited  by: 

David  Hansel,  University  of  Paris, 
France 

Reviewed  by: 

Kenji  Morita,  The  University  of 
Tokyo,  Japan 

Emmanuel  Procyk,  Institut  National 
de  la  Sante  et  de  la  Recherche 
Medicale,  France 

*Correspondence: 

William  H.  Alexander,  Department 
of  Experimental  Psychology,  Ghent 
University,  Henri  Dunantlaan  2, 
B-9000  Gent,  Belgium 
e-mail:  william. alexander@ugent. be 


A  recent  computational  neural  model  of  medial  prefrontal  cortex  (mPFC),  namely  the 
predicted  response-outcome  (PRO)  model  (Alexander  and  Brown,  2011),  suggests  that 
mPFC  learns  to  predict  the  outcomes  of  actions.  The  model  accounted  for  a  wide  range  of 
data  on  the  mPFC.  Nevertheless,  numerous  recent  findings  suggest  that  mPFC  may  signal 
predictions  and  prediction  errors  even  when  the  predicted  outcomes  are  not  contingent 
on  prior  actions.  Here  we  show  that  the  existing  PRO  model  can  learn  to  predict  outcomes 
in  a  general  sense,  and  not  only  when  the  outcomes  are  contingent  on  actions.  A  series 
of  simulations  show  how  this  generalized  PRO  model  can  account  for  an  even  broader 
range  of  findings  in  the  mPFC,  including  human  ERP  fMRI,  and  macaque  single-unit  data. 
The  results  suggest  that  the  mPFC  learns  to  predict  salient  events  in  general  and  provides 
a  theoretical  framework  that  links  mPFC  function  to  model-based  reinforcement  learning, 
Bayesian  learning,  and  theories  of  cognitive  control. 
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INTRODUCTION 

Medial  prefrontal  cortex  (mPFC),  especially  dorsal  anterior 
cingulate  cortex  (ACC)  has  been  repeatedly  and  extensively 
implicated  in  processing  and  monitoring  behavior  and  action 
(Falkenstein  et  al.,  1991;  Carter  et  al.,  1998;  Shima  and  Tanji, 
1998;  Botvinick  et  al.,  2001;  Holroyd  and  Coles,  2002;  Behrens 
et  al.,  2007;  Matsumoto  et  al.,  2007;  Rudebeck  et  al.,  2008).  A  new 
unified  model  of  the  mPFC,  th  e  predicted  response-outcome  (PRO) 
model  (Alexander  and  Brown,  2011),  proposes  that  mPFC  learns 
predictions  of  future  outcomes,  and  signals  unexpected  non¬ 
occurrences  of  predicted  outcomes.  The  model  comprehensively 
accounts  for  a  range  of  results  observed  in  mPFC  (including 
from  fMRI,  EEG,  and  single-unit  neurophysiology)  in  the  context 
of  cognitive  control,  including  effects  of  error,  conflict,  error 
likelihood,  and  several  others. 

While  earlier  simulations  of  the  PRO  model  focused  on  the 
role  of  mPFC  in  predicting  the  outcomes  of  actions,  the  mPFC 
is  also  engaged  in  tasks  without  a  significant  behavioral  com¬ 
ponent,  or  when  a  specific  motor  command  is  neither  planned 
nor  executed  (Buchel  et  al.,  2002;  Chandrasekhar  et  al.,  2008),  in 
processing  novel  stimuli  (Dien  et  al.,  2003;  Crottaz-EIerbette  and 
Menon,  2006),  in  predicting  task-related  stimuli  that  cue  future 
behavior  but  require  no  immediate  response  (Koyama  et  al.,  2001; 
Aarts  et  al.,  2008;  Aarts  and  Roelofs,  2010),  and  in  response  to 
painful  stimuli  (Buchel  et  al.,  2002;  Chandrasekhar  et  al,  2008). 
These  findings  suggest  a  role  for  mPFC  in  deploying  attention 
(Bryden  et  al.,  201 1;  Vachon  et  al.,  2012)  and  processing  novelty  or 
salience  (Downar  et  al.,  2002;  Litt  et  al.,  2011;  Wessel  et  al.,  2012). 

These  findings  present  a  significant  challenge  to  accounts  of 
mPFC  function  that  emphasize  its  role  in  the  regulation  and 
correction  of  behavior  alone.  Furthermore,  theories  regarding 


mPFC  function  will  necessarily  be  incomplete  so  long  as  findings 
regarding  the  role  of  mPFC  in  processing  stimuli  remain  unex¬ 
plained.  One  possibility  is  that  stimulus-related  activity  in  mPFC 
reflects  a  separate,  independent  function  of  mPFC  which  operates 
concurrently  with  mPFC  involvement  in  control  of  behavior.  A 
second  option  is  that  these  findings  are  a  product  of  the  same 
mechanisms  that  produce  effects  in  mPFC  related  to  action  and 
outcome. 

Can  the  same  principle  that  informed  the  PRO  model, 
prediction  of  likely  outcomes  and  detection  of  unexpected  non¬ 
occurrence,  be  deployed  to  explain  mPFC  activity  related  to  task- 
related  cues?  In  order  to  answer  this  question,  we  first  (re)consider 
what  we  mean  by  “outcome”.  In  the  original  PRO  model,  out¬ 
comes  were  conceived  as  events,  usually  reflecting  performance- 
related  feedback,  occurring  at  the  end  of  a  trial.  After  the  model 
was  presented  with  an  outcome,  all  learning  within  the  model 
ceased  and  all  activity  was  set  to  0  in  order  to  prepare  the  model 
for  the  next  trial. 

In  reality,  however,  a  person’s  experience  is  not  divided  into 
discrete  trials  in  this  fashion.  Even  in  the  highly-constrained 
reality  of  a  behavioral  experiment,  trials  are  followed  by  still  more 
trials,  each  identical  to  the  last  modulo  experimental  manipu¬ 
lations.  Each  time  an  “outcome”  is  observed  by  a  subject,  it  is 
reliably  followed  by  a  stimulus  indicating  the  onset  of  a  new 
trial,  which  is  itself  followed  by  another  outcome,  ad  infinitum 
(or  at  least  until  the  experimenter  allows  the  subject  to  leave). 
From  this  perspective,  the  distinction  between  an  outcome  and  a 
stimulus  becomes  ambiguous,  with  the  difference  seeming  to  rest 
on  experimenter  fat. 

With  this  in  mind,  we  propose  a  modest  extension  to  the 
original  PRO  model  (Figure  1).  Namely,  in  the  extended  PRO 
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model,  we  regard  stimuli  and  their  associated  outcomes  as  generic 
events,  where  events  are  considered  to  be  any  salient  sensory 
input  that  can  be  associated  with  subsequent  events,  and  may 
itself  be  predicted  by  previous  events.  It  is  essential  to  note  from 
the  outset  that  this  extension  is  a  conceptual  expansion  only,  and 
the  extended  PRO  model  below  is  identical  to  the  original  PRO 
model,  including  all  the  same  equations  and  parameters.  With  this 
simple  conceptual  extension,  we  are  able  to  demonstrate  how 
the  PRO  model,  in  addition  to  accounting  for  mPFC  activity 
associated  with  response  monitoring,  can  reproduce  a  range  of 
effects  observed  in  mPFC  and  related  primarily  to  processing 
sensory  stimuli  from  fMRI,  EEG,  and  single-unit  neurophysiolog¬ 
ical  studies.  These  findings  provide  additional  evidence  that  the 
hypothesis  underlying  the  PRO  model,  that  mPFC  is  involved  in 
prediction  and  detecting  discrepancies,  is  the  most  comprehensive 
account  of  mPFC  function  to  date. 

METHODS 

The  PRO  model  was  developed  to  account  for  mPFC  activity 
related  to  the  prediction  of  response-outcome  conjunctions,  and 
signaling  unexpected  deviations  from  expected  outcomes.  In  our 
extended  implementation  of  the  PRO  model,  we  generalize  these 


two  basic  functions  of  the  PRO  model  to  include  prediction  of  any 
salient  sensory  event  (including  outcomes),  as  well  signaling  devi¬ 
ations  from  expected  events.  In  order  to  describe  our  implemen¬ 
tation  of  the  extended  model,  we  first  review  relevant  equations 
from  the  original  model,  and  then  show  how  these  equations  have 
been  updated  to  generalize  the  events  they  represent. 

PRO  MODEL 

In  order  to  explain  effects  observed  in  mPFC  related  to  the 
prediction  and  observation  of  outcomes  following  a  behavioral 
response,  the  original  PRO  model  is  based  on  standard  reinforce¬ 
ment  learning  (RL)  models,  especially  temporal  difference  (TD) 
learning  (Sutton,  1988),  that  have  been  extended  in  the  following 
ways.  First,  in  typical  formulations,  RL  models  learn  a  scalar 
prediction  of  the  discounted  value  of  the  current  state.  In  contrast, 
the  PRO  model  learns  predictions  of  multiple  possible  outcomes, 
regardless  of  their  affective  valence,  using  a  vector-valued  error 
signal.  Activity  in  the  PRO  model  therefore  reflects  a  temporally 
discounted  prediction  of  various  outcomes  in  proportion  their 
probability  of  occurrence.  Second,  mPFC  effects  related  to  error 
are  explained  as  “negative  surprise”,  a  value  which  reflects  the 
aggregate  of  outcome  predictions  generated  by  the  model  minus 


FIGURE  1  |  Model  schematics.  In  the  original  publication  of  the  PRO 
model  (A),  the  model  learned  predictions  of  future  outcomes  (e.g.,  error  or 
correct  feedback)  based  on  task-related  cues  such  as  those  observed  in 
the  Eriksen  flanker  task.  In  our  extension  to  the  PRO  model  (B  and  C),  the 


model  continues  to  learn  the  association  between  task-related  cues  and 
feeback  (B).  Task-related  feedback  then  acts  as  a  stimulus  in  its  own  right 
in  order  to  learn  associations  between  feedback  and  future  task-related 
cues  (C). 
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observed  outcomes.  The  PRO  model  represents  time  as  a  tapped- 
delay  line  in  which  each  unit  reflects  the  amount  of  time  elapsed 
since  the  presentation  of  a  stimulus.  Each  iteration  of  the  model 
was  interpreted  as  lasting  10  ms. 

Formally,  predictions  in  the  model  are  computed  as: 

Pi,t  =  Y.  Sjk,t  x  Wijk,t  (1) 

j* 

where  S  is  the  tapped-delay  representation  of  a  stimulus,  W 
are  learned  prediction  weights  associating  stimuli  with  possible 
outcomes,  P,  and  i,j  and  k  index  outcomes,  tapped-delay  units, 
and  stimulus  identity,  respectively.  Weights  are  updated  according 
to: 

Wy/fc,t+i  =  Wyktt  +  a8iitSjk  (2) 

where  a  is  a  learning  rate  parameter.  W  is  further  constrained  by 
W  >  0.  S  is  an  eligibility  trace  computed  as: 


Sjk,t+1  =  Sjk,t  +  0.95  Sjk,t 

(3) 

Finally,  8  is  a  TD  error: 

8i,t  =  Oj,t  +  yPi,t+ 1  —  Pi,t 

(4) 

where  O  is  the  outcome  i  observed  on  the  current  model  iteration 
t,  and  y  is  a  temporal  discount  factor  (y  =  0.95). 

EXTENDED  MODEL 

As  described  above,  the  central  premise  underlying  our  extended 
implementation  of  the  PRO  model  is  that  outcomes  and  the 
stimuli  which  precede  them  can  be  regarded  as  generic  events, 
by  which  we  mean  any  salient  information  (i.e.,  experimental 
variables)  a  subject  may  encounter  in  the  course  of  an  experiment, 
up  to  and  including  information  that  may  not  pertain  to  the 
experimental  task  as  such  but  merely  signals  the  onset  of  a  new 
trial  (e.g.,  fixation  points).  Accordingly,  the  relevant  equations 
given  above  are  rewritten  as: 
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Ejk,t+ 1  =  EjXt  +  0.95  Ejk,t 

(7) 

8i,t  =  Ejj  +  yPUt+1  —  Pij 

(8) 

These  equations  are  identical  with  Eqs.  1-4,  with  the  exception 
that  all  instances  of  S  and  O  are  now  replaced  by  E,  reflecting 
the  more  general  role  of  both  stimuli  and  outcomes  as  events 
that  can  be  predicted  as  well  as  serve  as  the  basis  for  predicting 
future  events.  In  order  to  accommodate  learning  predictions 
about  the  relationship  between  events,  broadly  construed,  the 
model  was  further  altered  by  allowing  learning  to  occur  even 
after  the  conclusion  of  a  trial.  Finally,  activity  in  the  model  was 
computed  as  “negative  surprise”: 

w?  =  yiPi,t-Ei,t}+  (9) 


reflecting  the  unexpected  non-occurrence  of  a  predicted  event. 
Except  for  simulation  6  (discussed  below),  this  measure  of  model 
activity  is  used  in  all  simulations. 

In  addition  to  these  four  core  equations,  the  original  PRO 
model  incorporated  mechanisms  by  which  the  model  was  able 
to  interact  with  simulated  cognitive  control  tasks.  These  mech¬ 
anisms  remain  unchanged,  and  the  parameters  used  for  previous 
simulations  are  the  same  as  previously  reported  (Alexander  and 
Brown,  2011).  These  parameters  were  derived  from  model  fits 
to  behavioral  data  from  a  previously  reported  study  (Brown 
and  Braver,  2005).  Model  parameters  were  not  altered  from  one 
simulation  to  the  next.  For  simulations  in  which  an  event  was  not 
associated  with  a  particular  behavior  (e.g.,  experiments  in  which 
certain  stimuli  do  not  require  a  response),  stimulus-response 
weights  in  the  model  were  set  to  0. 

SIMULATIONS 

Unless  otherwise  note,  simulated  experiments  included  10  indi¬ 
vidual  simulations,  each  corresponding  to  a  single  subject,  of  the 
PRO  model  in  the  tasks  described  below.  In  each  task,  or  in  each 
experimental  condition  within  each  task,  the  model  was  presented 
with  300  trials.  At  the  beginning  of  each  individual  simulation, 
adjustable  model  weights  were  set  to  0.  Because  trials  for  each  task 
were  selected  randomly,  and  because  responses  were  influenced 
both  by  learned  and  static  weights  as  well  as  by  an  additional  noise 
component,  the  development  of  activity  in  the  model  varied  from 
one  individual  simulation  to  the  next.  In  our  simulations,  we  did 
not  simulate  variability  in  inter-trial  or  inter-stimulus  intervals 
due  to  the  dependence  of  the  model  on  consistent  timing  of 
events  to  converge  (resulting  from  its  formulation  based  on  TD 
learning). 

SIMULATION  1:  FREQUENT  VS.  INFREQUENT  TRIALS 

Effects  of  trial  frequency  on  model  activity  were  simulated  using 
an  Eriksen  flanker  task  (Eriksen,  1995)  in  two  separate  simulated 
experiments  in  which  the  frequency  of  trial  types  (congruent 
and  incongruent)  was  manipulated.  In  the  frequent  condition  for 
both  experiments,  frequent  trials  were  observed  approximately 
75%  of  the  time,  while  infrequent  trials  were  approximately 
25%  of  all  trials.  A  total  of  eight  events  were  modeled:  left  and 
right  target  cues,  left  and  right  flanker  cues,  as  well  as  the  four 
possible  response-outcome  conjunctions  (left/error,  right/error, 
left/correct,  right/correct).  Model  activity  was  averaged  over  the 
first  20  model  iterations  following  the  onset  of  the  target  and 
flanker  cues. 

SIMULATION  2:  ITEM-SPECIFIC  VS.  GLOBAL  CONTROL 

The  model  was  run  in  three  separate  simulated  experiments  using 
a  version  of  the  Stroop  task  (Stroop,  1935)  in  which  the  frequency 
of  congruent  vs.  incongruent  trials  was  manipulated  both  at  a 
global  level,  as  well  as  at  the  level  of  individual  stimuli  as  in  Blais 
and  Bunge  (2010).  In  each  experiment,  two  classes  of  stimuli 
were  used.  In  each  stimulus  class,  two  specific  colors  could  be 
combined  to  generate  Stroop  stimuli.  For  example,  one  stimulus 
class  might  include  the  colors  red  and  green  used  to  generate 
incongruent  and  congruent  stimuli — the  word  “red”  displayed 
in  green  font,  or  vice  versa  (incongruent  trials),  or  the  word 
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“red”  (or  green)  displayed  in  red  (or  green)  font  (congruent 
trials),  while  the  2nd  stimulus  class  would  generate  stimuli  using 
two  different  colors  (e.g.,  yellow  and  blue).  In  each  experiment, 
both  the  global  probability  of  observing  an  incongruent  trial 
(regardless  of  stimulus  class),  as  well  as  the  item-specific  (class- 
dependent)  probability  of  observing  an  incongruent  trial  were 
manipulated.  In  the  1st  experiment,  the  global  probability  of 
observing  an  incongruent  trial  was  0.3,  while  the  item-specific 
probability  was  0.1  and  0.5  for  the  two  stimulus  class.  In  the 
second  experiment,  both  the  global  and  item-specific  probabilities 
of  an  incongruent  trial  were  0.5.  Finally,  in  experiment  3,  the 
global  probability  was  0.7  while  the  item-specific  probabilities 
were  0.5  and  0.9  for  the  two  stimulus  classes.  In  each  simulated 
experiment,  a  total  of  eight  events  were  modeled:  one  event  for 
each  color  word,  and  one  for  each  font  color,  as  well  as  four  possi¬ 
ble  response-outcome  conjunctions  (Colorl/Error,  Color2/Error, 
Color  1 /Correct,  Color2/Correct).  For  each  experiment,  the  model 
was  simulated  for  200  trials,  and  model  activity  was  averaged  over 
the  first  20  iterations  following  presentation  of  the  stimulus. 

SIMULATION  3:  MISMATCH  NEGATIVITY 

The  mismatch  negativity  (MMN)  was  simulated  as  a  punctuate 
stimulus  presented  to  the  model  that  repeated  every  30  model 
iterations  (300  ms).  Since  no  response  was  required  by  the  model, 
components  of  the  PRO  model  related  to  response  generation 
were  lesioned  by  setting  all  weights  for  connections  projecting 
to  and  from  those  components  to  0.  The  model  was  trained  on 
the  repeating  stimulus  for  200  repetitions,  following  which  single 
trials  were  simulated  in  which  the  stimulus  was  withheld  follow¬ 
ing  a  number  of  repetitions  (1-7).  Model  activity  for  all  trials 
involving  a  withheld  stimulus  was  averaged  together  regardless  of 
the  number  of  stimulus  repetitions  observed  prior  to  the  withheld 
stimulus,  and  activity  for  was  recorded  for  the  40  iterations  prior 
to  the  usual  presentation  time  of  the  stimulus  to  20  iterations  after 
the  usual  presentation.  Model  activity  for  non-mismatch  trials 
was  averaged  over  all  trials  in  which  a  stimulus  was  presented  as 
expected,  and  activity  was  recorded  as  for  mismatch  trials. 

SIMULATION  4:  INFORMATIVE  VS.  UNINFORMATIVE  CUES 

The  task  used  by  Aarts  et  al.  (2008)  was  an  arrow-word  version 
of  the  Stroop  task  in  which  subjects  were  presented  with  both  a 
word  and  visual  cue  indicating  the  direction  in  which  they  should 
respond  (e.g.,  the  word  “right”  printed  within  an  arrow  pointing 
left).  On  congruent  trials,  both  the  word  and  the  visual  cue  indi¬ 
cated  the  same  direction,  while  on  incongruent  trials,  the  word 
and  visual  cue  indicated  opposite  responses.  Prior  to  the  onset  of 
the  task  itself,  subjects  were  presented  one  of  three  possible  cues, 
each  of  which  indicated  whether  the  upcoming  task  would  involve 
an  incongruent  trial  (approximately  1/3  of  all  trials),  a  congruent 
trial  (approximately  1/3  of  all  trials),  or  providing  no  information 
as  to  the  nature  of  the  trial  (approximately  1/3  of  all  trials).  A 
total  of  1 1  events  were  modeled:  1  for  each  of  the  cue  conditions 
(informed/congruent,  informed/incongruent,  uninformative),  3 
events  for  task  stimuli  (1  for  the  central  target  stimulus,  and 
1  each  for  congruent  and  incongruent  flankers)  and  4  for  the 
possible  response-outcome  conjunctions  (left/error,  left/correct, 
right/error,  right/ correct).  Model  activity  was  averaged  over  the  20 


iterations  following  cue  presentation  for  cue-related  effects,  and 
averaged  over  the  20  iterations  following  presentation  of  the  trial 
(and  preceding  the  model  response  or  feedback)  for  target-related 
effects. 

SIMULATION  S:  BAYESIAN  SURPRISE 

In  the  stop  signal  task,  subjects  are  presented  with  a  cue  indicating 
that  a  response  is  to  be  made.  On  a  subset  of  trials,  the  subjects 
are  subsequently  presented  with  a  second  cue  indicating  that  the 
subject  should  cancel  the  response  to  the  first  cue.  We  simulated 
the  PRO  model  performing  the  stop  signal  task  with  the  same 
frequency  of  go  vs.  stop  trials  reported  in  Ide  et  al.  (2013)  (75% 
and  25%,  respectively).  Model  activity  was  averaged  over  the  20 
model  iterations  following  the  presentation  of  a  Stop  cue.  For  each 
trial,  the  probability  of  observing  a  stop  trial  was  calculated  as 
proportion  of  stop  trials  over  the  previous  ten  trials.  High  and  low 
probability  trials  were  classified  by  a  median  split  of  the  estimated 
probabilities  of  all  trials  experienced  by  the  model.  Seven  events 
were  modeled:  1  for  the  fixation  point  presented  at  the  beginning 
of  each  trial,  1  each  for  the  go  and  stop  signals,  and  4  for  the 
possible  response-outcome  conjunctions  (Go/Correct,  Go/Error, 
Stop/Correct,  and  Stop/Error). 

SIMULATION  6:  SINGLE-UNIT  ACTIVITY 

In  the  expect  reward  task  (Sallet  et  al.,  2007)  conducted  with 
monkeys,  the  animal  was  presented  with  a  cue  indicating  the  mag¬ 
nitude  of  a  reward  that  would  be  delivered  following  a  subsequent 
presentation  of  the  same  cue.  Reward  magnitudes  could  be  either 
small,  medium  or  large.  On  a  subset  of  trials  in  the  large  and  small 
magnitude  conditions,  the  cue  for  the  opposite  reward  (small 
instead  of  large,  large  instead  of  small)  was  presented  following 
the  initial  cue.  We  simulated  the  PRO  model  on  200  trials  of  the 
expect  reward  task.  A  total  of  10  events  were  modeled:  1  event 
for  the  starting  position  presented  at  the  beginning  of  the  trial, 
3  events  represented  the  reward  magnitude  cues  during  the  Cue 
phase  of  the  trial,  3  events  represented  the  reward  cues  presented 
during  the  Go  phase  of  each  trial,  and  3  events  rewarded  the 
reward  received  (small,  medium,  or  large).  Note  that  the  activity 
of  reward  events  was  binary,  and  was  intended  to  simulate  the 
identity  of  the  reward  rather  than  its  salience  or  value.  This  is 
consistent  with  the  theory  underlying  the  PRO  model  that  states 
that  mPFC  learns  the  likely  outcomes  of  actions  rather  than  the 
value  of  those  outcomes.  Activity  for  cue  related  activity  was 
averaged  over  20  iterations  following  the  presentation  of  the  first 
cue  and,  separately,  following  the  presentation  of  the  second  cue. 
Since  we  sought  to  account  for  single-unit  activity,  the  activity  of 
single  units  in  the  model  was  computed  as  in  Eq.  9,  but  the  results 
were  not  summed. 

RESULTS 

In  previously  published  simulations  (Alexander  and  Brown, 
2011),  we  selected  tasks  on  which  to  test  the  PRO  model  based 
on  their  potential  to  highlight  a  key  strength  of  the  model. 
Namely,  we  showed  how  the  straightforward  intuition  underlying 
the  model,  that  mPFC  predicts  future  outcomes  and  signals 
deviations  from  expectations,  can  account  for  a  wide  range  of 
data  under  a  single,  unifying  framework.  Specifically,  we  showed 
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how  the  PRO  model  accounted  for  data  from  fMRI,  EEC,  and 
single-unit  neurophysiology  studies,  while  also  showing  how 
Bayesian  accounts  of  mPFC  activity  could  be  reconciled  with 
RL  formulations.  At  the  same  time,  we  demonstrated  that  the 
PRO  model,  beyond  capturing  effects  also  accounted  for  by 
other  models  of  mPFC  (e.g.,  Botvinick  et  al.,  2001;  Holroyd 
and  Coles,  2002;  Brown  and  Braver,  2005),  could  addition¬ 
ally  reproduce  patterns  of  activity  competing  models  could  not 
(e.g.,  Amador  et  al.,  2000;  Arniez  et  al.,  2006;  Jessup  et  al., 
2010). 

Our  goal  in  the  present  study  is  similar,  in  that  we  seek  to 
demonstrate  how,  with  a  minimal  amount  of  alteration,  the  PRO 
model  may  be  extended  to  address  results  from  the  neuroscience 
literature  showing  mPFC  involvement  in  the  expectation  and 
detection  of  stimuli.  Accordingly,  the  data  we  have  chosen 
to  simulate  include  results  from  fMRI,  EEC,  and  single-unit 
neurophysiology  studies,  as  well  as  results  implicating  mPFC  in 
Bayesian  surprise. 


SIMULATION  1:  FREQUENT  VS.  INFREQUENT  TRIALS 

Some  fMRI  and  EEC  studies  manipulate  the  relative  fre¬ 
quency  of  congruent  vs.  incongruent  trials  in  common  cog¬ 
nitive  control  tasks  (e.g.,  the  Eriksen  flanker  task  or  the 
Stroop  task).  They  have  observed  an  inverse  correlation  of 
conflict-related  effects  with  the  frequency  of  incongruent  tri¬ 
als  (Carter  et  al.,  2000).  The  PRO  model  explains  this  as 
an  increased  prediction  of  the  likelihood  of  an  incongru¬ 
ent  trial  occurring  in  high-frequency  incongruent  conditions, 
with  an  attendant  decrease  in  surprise  when  a  predicted 
incongruent  trial  is  experienced  (Figure  2A).  These  studies 
also  find  that  activity  for  infrequent  incongruent  trials  is 
greater  than  for  infrequent  congruent  trials  when  trials  are 
matched  for  frequency.  The  PRO  model  captures  this  effect 
and  explains  it,  as  in  previously  published  simulations,  as 
the  effect  of  multiple  concurrent  predictions  for  incongruent 
trials  that  proceed  from  the  appearance  of  an  incongruent 
stimulus. 
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FIGURE  2  |  Trial  frequency,  item-level  control,  and  the  mismatch 
negativity.  (A)  Activity  in  the  PRO  model  at  the  onset  of  a  trial  in  the 
Eriksen  flanker  task  reflects  the  overall  frequency  with  which  a  particular 
trial  type  (congruent  or  incongruent)  is  observed.  When  mostly 
congruent  trials  are  experienced,  infrequent  incongruent  trials  result  in 
increased  model  activity  relative  to  congruent  trials,  while  the  reverse 
holds  true  for  conditions  in  which  mostly  incongruent  trials  are 
observed.  (B)  Activity  in  the  model  is  proportional  to  the  frequency  with 


a  particular  trial  type  (e.g.,  incongruent  or  congruent)  is  observed  with 
respect  to  a  particular  stimulus  type  (e.g.,  Stroop  stimuli  constructed 
using  the  color  pair  RED  and  GREEN  vs.  stimuli  constructed  using  the 
color  pair  BLUE  and  YELLOW),  and  is  not  sensitive  to  the  overall 
frequency  of  a  trial  type  without  regard  for  stimulus  types.  (C)  Activity 
in  the  PRO  model  is  greater  following  the  surprise  absence  of  a 
stimulus  that  commonly  occurs  as  part  of  a  sequence  of  stimuli  (cf. 
Crottaz-Herbette  and  Menon,  2006). 
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SIMULATION  2:  ITEM-LEVEL  VS.  GLOBAL  CONTROL 

The  conflict  model  of  ACC/mPFC  suggests  that  cognitive  control 
is  proportional  to  the  global  statistics  of  a  task;  as  the  proportion 
of  incongruent  trials  increases,  so  too  does  the  overall  need  for 
top-down  control  to  be  deployed  in  order  to  successfully  perform 
a  task,  with  a  resultant  decrease  in  levels  of  conflict-related  activity 
in  ACC.  However,  both  behavioral  and  fMRI  studies  (Bugg  et  al., 
2008;  Blais  and  Bunge,  2010)  investigating  this  prediction  have 
found  that  control  appears  to  depend  on  the  frequency  of  item- 
specific  incongruent  trials;  particular  stimuli  associated  with  a 
higher  proportion  of  incongruent  trials  appear  to  benefit  more 
from  adaptation  effects  relative  to  stimuli  with  a  lower  proportion 
of  incongruent  trials.  Accordingly,  since  the  PRO  model  learns 
predictions  of  likely  events  contingent  on  stimuli  presented,  sim¬ 
ulated  model  activity  at  the  onset  of  incongruent  trials  is  inversely 
proportional  to  the  overall  item-specific  frequency  of  incongruent 
trials  (Figure  2B). 

SIMULATION  3:  MISMATCH  NEGATIVITY 

The  MMN  ERP  component  is  observed  when,  in  the  course 
of  presentation  of  a  predictable  sequence  of  stimuli,  a  partic¬ 
ular  stimulus  within  that  sequence  is  surprisingly  altered  (e.g., 
a  high  tone  rather  than  a  usual  low  tone)  or  withheld  alto¬ 
gether.  The  MMN  is  most  apparent  in  sensory  cortices  related 
to  the  stimulus  modality,  though  EEG  studies  have  also  iden¬ 
tified  generators  in  frontal  cortex,  especially  mPFC  (Crottaz- 
Herbette  and  Menon,  2006)  with  an  onset  delayed  compared 
to  sensory  cortex.  The  PRO  model  accounts  for  the  MMN 
observed  within  mPFC  as  the  surprising  absence  of  a  stimulus 
in  a  sequence  whose  occurrence  was  predicted  by  the  previous 
stimulus  (Figure  2C).  Note  that  because  activity  in  the  PRO 
model  derives  entirely  from  the  unexpected  non-occurrence  of  an 
expected  event,  the  model’s  interpretation  of  the  MMN  remains 
the  same  regardless  whether  a  predicted  stimulus  in  a  sequence 
is  absent,  or  if  a  novel  stimulus  is  inserted  in  its  place  (i.e., 
oddball  paradigm).  In  both  cases,  the  predicted  event  failed  to 
occur. 

SIMULATION  4:  INFORMATIVE  VS.  UNINFORMATIVE  CUES 

Aarts  et  al.  (2008)  observed  increased  activity  in  ACC  follow¬ 
ing  informative  cues  (cues  which  indicated  whether  the  subject 
would  subsequently  perform  a  congruent  or  incongruent  trial 
of  a  modified  Stroop  task)  vs.  uninformative  cues.  ACC  activity 
at  the  time  the  cued  task  was  presented  was  lower  following 
informative  cues  relative  to  tasks  occurring  after  uninformative 
cues,  regardless  of  whether  the  trial  itself  was  incongruent  or 
congruent.  The  PRO  model  accounts  for  increased  activity  fol¬ 
lowing  an  informative  cue  (Figure  3A)  as  the  increased  pre¬ 
dictive  activity  related  to  the  certain  occurrence  of  either  an 
incongruent  or  congruent  trial  vs.  the  weak  activity  following 
uninformative  cues  related  to  uncertain  predictions  regarding  the 
nature  of  the  next  trial.  Similarly,  activity  at  the  onset  of  the 
target  task  following  an  informative  cue  is  reduced  regardless  of 
trial  type  (Figure  3B)  since  the  model’s  prediction  corresponds 
with  the  observed  event,  while  activity  at  trial-onset  following 
uninformative  cues  reflects  the  unexpected  non-occurrence  of 
at  least  one  of  the  model’s  predictions.  Note  that  although  the 
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FIGURE  3  |  Informative  vs.  uninformative  cues  (A)  Activity  in  the  PRO 

model  is  greater  when  a  cue  is  presented  indicating  the  type  of  trial  (e.g., 
incongruent  or  congruent)  that  will  be  presented  to  the  mode!  in  the  near 
future,  as  compared  to  a  cue  that  is  uninformative  (i.e.,  equal  chance  of 
either  trial  type).  (B)  Conversely,  when  a  trial  is  presented,  model  activity  is 
lower  when  the  trial  type  has  been  previously  cued  compared  to  trials  that 
have  been  preceded  by  an  uninformative  cue.  (C)  When  model  activity  is 
recorded  over  a  longer  duration  following  trial  presentation,  reflecting  the 
low  temporal  resolution  of  fMRI,  trial-related  activity  for  incongruent  trials 
increases  relative  to  high  temporal  resolution  recording  (frame  B). 


PRO  model  captures  the  broad  pattern  observed  in  Aarts  et  al. 
(2008),  the  model  reverses  the  direction  of  the  effect  observed 
at  the  onset  of  congruent  tasks  vs.  incongruent  task  following 
uninformative  cues.  To  explain  this  discrepancy  between  model 
predictions  and  empirical  results,  we  note  that  our  simulations 
sampled  only  a  limited  window  of  time  following  tonset  of 
the  task,  equivalent  to  200  ms  of  real  time  and  far  below  the 
2100  ms  repetition  time  used  by  Aarts  et  al.  to  obtain  their 
data.  During  this  window,  subjects  were  required  to  perform  the 
task  and  monitor  the  outcomes  of  their  behavioral  responses. 
We  therefore  simulated  the  Aarts  task  again,  this  time  using  a 
window  of  1000  ms  following  the  onset  of  the  target  task,  and 
find  that  the  discrepancy  between  congruent  and  incongruent 
trials  in  the  uninformed  condition  is  eliminated  (Figure  3C). 
In  this  simulation,  all  inter-stimulus  and  inter-trial  intervals 
were  identical  to  the  initial  simulation.  In  the  model,  increased 
activity  to  uninformed  congruent  trials  (relative  to  uninformed 
incongruent  trials)  in  the  first  200  ms  following  task  onset  is 
due  to  stronger  predictive  activity  related  to  the  almost  certain 
successful  completion  of  the  congruent  task.  At  longer  intervals, 
activity  for  uninformed  incongruent  trials  is  higher  relative  to 
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FIGURE  4  |  Bayesian  surprise  The  PRO  model  reproduces  the  pattern  of 
activity  observed  in  ACC  based  on  local  estimates  of  the  likelihood  of 
observing  STOP  or  GO  trials  in  a  stop  signal  task.  Activity  at  the  onset  of  a 
GO  trial  is  higher  when  the  estimated  likelihood  of  observing  a  STOP  trial  is 
high.  Conversely,  activity  at  the  onset  of  a  STOP  trial  is  higher  when  the 
estimated  likelihood  of  a  STOP  trial  is  believed  to  be  low. 


uninformed  congruent  trials  due  to  both  the  increased  timed 
needed  to  perform  an  incongruent  trial,  as  well  as  surprise  signals 
related  to  both  correct  and  incorrect  performance.  At  the  tempo¬ 
ral  resolution  at  which  data  can  be  measured  via  fMRI,  these  early 
and  late  components  of  the  task  are  not  separable.  Our  finding 
of  differential  model  activity  for  incongruent  vs.  congruent  trials 
during  early  periods  following  the  onset  of  the  target  task  is  a 
novel  prediction  of  the  PRO  model  which  may  be  tested  using 
techniques  with  higher  temporal  resolution  than  standard  fMRI 
allows. 

SIMULATION  5:  BAYESIAN  SURPRISE 

MPFC  activity  has  been  linked  to  computations  related  to 
Bayesian  decision-making.  In  previous  simulations,  we  showed 
how  the  PRO  model  might  establish  a  link  between  mechanistic 
models  of  mPFC  with  more  abstract  Bayesian  models  by  show¬ 
ing  that  it  could  reproduce  effects  of  environmental  volatility 
(Behrens  et  al.,  2007)  as  estimated  by  a  Bayesian  algorithm. 
Recently,  Ide  et  al.  (2013)  applied  a  Bayesian  model  (the  Dynamic 
Belief  Model  (Yu  et  al.,  2009))  to  the  analysis  of  fMRI  data  from  a 
stop-signal  task.  The  Dynamic  Belief  Model  updates  its  estimation 
of  the  likelihood  of  observing  a  stop-signal  trial  based  on  the 
recent  history  of  stop  and  go  trials  that  have  been  observed. 
This  estimation  is  used  to  calculate  a  Bayesian  surprise  signal, 
essentially  the  unsigned  prediction  error  calculated  as  the  absolute 
difference  between  the  model’s  estimation  of  the  probability  of 
a  trial  type  and  the  actual  trial  type  observed.  The  PRO  model, 
which  at  its  core  is  a  model  concerned  with  predicting  likely  events 
and  signaling  discrepancies  between  observed  and  actual  events, 
accounts  for  the  data  in  much  the  same  way  as  reported  earlier 
(Ide  et  al.,  2013).  When  faced  with  a  Stop  trial,  the  activity  of 
the  PRO  model  is  higher  for  situations  in  which  recent  trials 
have  included  only  a  few  Stop  trials,  relative  to  situations  in 
which  recent  trials  have  had  a  higher  proportion  of  Stop  trials 
(Figure  4).  Similarly,  when  given  a  Go  trial,  PRO  model  activity 


is  greater  when  the  estimation  of  the  likelihood  of  a  Stop  trial 
occurring  is  high  vs.  a  low  estimation  of  the  likelihood  of  a  Stop 
trial. 

SIMULATION  6:  SINGLE-UNIT  ACTIVITY 

A  major  strength  of  the  original  PRO  model  is  its  ability  to 
account  for  effects  related  both  to  the  activity  of  ensembles 
of  neurons  (fMRI  and  EEG),  as  well  as  the  activity  of  single 
neurons  within  mPFC.  Here  we  demonstrate  that,  by  extending 
the  PRO  model  to  predict  events,  broadly  construed,  it  is  capable 
of  capturing  additional  single-unit  data  related  to  the  occurrence 
of  task-related  stimuli.  In  earlier  work  (Sallet  et  al.,  2007),  single 
neurons  in  monkey  ACC  were  observed  whose  activity  following 
the  presentation  of  an  initial  cue  was  specific  to  the  amount  of 
reward  to  be  eventually  received  by  the  monkey:  cues  indicating 
small  rewards  activated  a  separate  population  of  neurons  than  did 
cues  indicating  large  rewards.  Following  a  delay  after  the  initial 
cue,  an  additional  cue  was  presented.  On  the  majority  of  trials 
(75%),  the  2nd  cue  was  identical  to  the  initial  cue-if  the  first  cue 
indicated  a  small  reward,  the  second  cue  did  as  well.  On  25%  of 
trials,  however,  the  second  cue  indicated  a  different  reward  than 
did  the  first  cue;  if  the  monkey  had  initially  been  shown  the  small 
reward  cue,  it  would  now  be  shown  the  large  reward  cue,  and 
vice-versa. 

The  authors  identified  two  groups  of  neurons  that  appeared 
to  code  for  the  gain  or  loss  of  reward  associated  with  infrequent 
cue  switches.  One  group  showed  a  large  increase  in  activity  in 
response  to  being  shown  a  large  reward  cue  after  having  been 
initially  shown  a  small  reward  cue.  These  same  neurons  also 
responded  (although  somewhat  more  weakly)  when  the  initial 
cue  shown  to  the  monkey  was  associated  with  the  large  reward.  A 
second  group  of  neurons  showed  the  reverse  pattern,  responding 
strongly  when  a  second  cue  indicated  a  small  reward  following  an 
initial  cue  signaling  a  large  reward,  and  responding  more  weakly 
when  the  initial  cue  indicated  a  small  reward.  This  pattern  of 
activity  is  interpreted  by  the  authors  as  evidence  for  the  hypothesis 
that  mPFC  neurons  code  for  both  unexpected  events,  but  also 
specifically  for  reward  gains  and  losses. 

The  notion  that  mPFC  neurons  signal  discrepancies,  both 
positive  and  negative,  between  expected  and  actual  reward  magni¬ 
tudes  in  separate  neuronal  populations  is  broadly  consistent  with 
the  theory  underlying  the  PRO  model  insofar  as  the  PRO  model 
characterizes  mPFC  as  a  region  involved  in  signaling  deviations 
from  expectations.  The  extended  PRO  model  is  able  to  capture 
the  pattern  of  effects  observed  by  Sallet  et  al.  (2007),  as  shown 
in  Figure  5.  Rather  than  specifically  coding  for  gains  and  losses, 
however,  the  PRO  model  suggests  that  increased  activity  follow¬ 
ing  an  unexpected  second  cue  represents  the  unexpected  non¬ 
occurrence  of  a  predicted  cue.  This  interpretation  applies  as  well 
to  activity  observed  at  the  presentation  of  the  initial  cue,  where  the 
prediction  of  the  presentation  of  a  either  a  cue  indicating  a  small 
magnitude  reward  or  a  cue  indicating  a  large  magnitude  reward  is 
unmet. 

DISCUSSION 

In  this  article,  we  have  presented  an  extended  implementation  of 
the  PRO  model  of  mPFC,  and  conducted  a  number  of  simulations 
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FIGURE  5  |  Single-unit  data  (cf.  Sallet  et  al.,  2007,  Figure  7)  The 

activity  of  single  units  in  the  PRO  model  corresponds  with  data  showing 
populations  of  neurons  within  ACC  whose  activity  appears  to  code  for 
LARGE  (A  and  B)  or  SMALL  (C  and  D)  reward  magnitudes.  When 
presented  with  an  initial  cue  indicating  either  reward  magnitude  (left 
panels),  a  subset  of  individual  units  remain  active  while  the  activity  of 
other  units  falls  to  0.  Following  a  second  GO  cue  (center  panels),  individual 


units  appear  to  indicate  surprising  gains  or  losses,  as  would  be  the  case 
when  a  LARGE  reward  is  initially  cued,  followed  by  a  second  cue 
indicating  a  SMALL  reward  (top  panels),  and  vice  versa  (bottom  panels). 
Model  activity  is  greater  for  GO  cues  than  for  initial  cues  when  the  reward 
magnitude  indicated  by  the  two  cues  is  consistent  (right  panels),  while 
activity  is  maximal  for  GO  cues  which  are  inconsistent  with  initial  cues, 
either  indicating  a  greater  or  lesser  reward. 


showing  that,  using  this  extended  framework,  the  model  can  cap¬ 
ture  an  additional  range  of  effects  observed  within  mPFC  primar¬ 
ily  related  to  the  detection  and  processing  of  task-related  stimuli. 
The  extended  PRO  model  is  not  different  from  the  original  PRO 
model,  in  that  it  uses  the  same  formal  equations  and  parameter 
values.  The  key  innovation  underlying  our  extension  to  the  model 
is  conceptual — we  treat  stimuli  and  outcomes,  elements  of  the 
study  of  behavior  that  have  long  existed  at  opposite  ends  of  a 
trial,  as  being  functionally  equivalent  in  terms  of  their  ability  to 
serve  as  the  basis  for  future  predictions  and  to  signal  discrepancies 
between  expected  and  actual  events. 

In  our  previous  work,  we  noted  that  the  PRO  model  offered 
a  unifying  account  of  mPFC  activity  in  the  context  of  cognitive 
control.  The  PRO  model  posited  two  main  signals  of  prediction 
and  comparison  (i.e.,  prediction  error)  (Alexander  and  Brown, 
2010,  2011;  Brown,  2013)  These  functions  are  consistent  with  a 
variety  of  recent  empirical  results  (Kennerley  et  al.,  2011;  Hayden 
et  ah,  2011a),  and  the  prediction  error  signals  may  be  a  key  signal 
that  updates  behavior  (Hayden  et  ah,  2011b;  Rolling  et  ah,  2012). 
Our  recent  neuroimaging  findings  show  distinct  prediction  and 
prediction  error  regions  within  the  mPFC,  consistent  with  the 
PRO  model  Qahn  et  ah,  2014).  In  the  present  manuscript,  we 
extend  the  earlier  PRO  model  account  to  include  experimental 
paradigms  not  explicitly  related  to  response  generation.  Indeed, 
recent  studies  outside  the  purview  of  the  original  PRO  model  have 
yielded  results  that  are  readily  interpretable  within  the  framework 
of  the  extended  PRO  model,  including  findings  regarding  mPFC 
activity  when  monitoring  the  actions  of  others  (Apps  et  ah, 
2012),  during  tasks  focusing  on  predicting  and  detecting  painful 


stimuli  (Biichel  et  ah,  2002;  Chandrasekhar  et  ah,  2008),  or 
processing  unexpected  salient  stimuli  (Talmi  et  ah,  2013).  The 
extended  PRO  model  here  may  be  viewed  as  continually  trying 
to  build  an  accurate  internal  model  of  the  environment.  Every 
surprising  event  in  turn  adjusts  the  model  to  minimize  future 
surprise.  In  that  sense,  the  model  is  generally  consistent  with 
the  theoretical  principles  of  free  energy  minimization  (Friston, 
2010). 

In  addition  to  accounting  for  a  new  set  of  neural  data,  our 
present  simulations  provide  further  evidence  in  support  of  the 
role  of  mPFC  in  model-based  RL  (Dayan  and  Niv,  2008),  impli¬ 
cating  the  mPFC  in  building  internal  models  of  the  environment. 
Other  studies  (Glascher  et  al.,  2010;  Ide  et  al.,  2013)  have  iden¬ 
tified  signals  in  the  brain  that  appear  to  be  consistent  with  some 
form  of  model-based  RL  (as  opposed  to  model-free  RL),  includ¬ 
ing  signals  that  occur  in  regions  that  are  known  to  interact  with 
mPFC.  Model-based  RL  is  distinct  from  model-free  RL  in  that 
it  is  concerned  with  learning  a  model  of  an  environment,  often 
rendered  as  a  state-transition  matrix  containing  the  estimated 
probabilities  of  transitioning  from  one  state  to  another  (Simon 
and  Daw,  2011),  while  model-free  RL  uses  a  scalar  value  signal 
to  improve  estimates  of  future  rewards.  Neurally,  model-free  RL 
is  generally  considered  to  involve  primarily  subcortical  struc¬ 
tures  heavily  innervated  by  dopamine  neurons,  including  nucleus 
accumbens  and  striatum,  areas  that  are  frequently  observed  to 
respond  to  value  and  reward  in  decision-making  tasks,  and  sub¬ 
stantial  research  has  linked  DA  activity  in  VTA  to  model-free  RL 
(Cardinal  and  Cheung,  2005;  Daw  and  Doya,  2006;  Doya,  2007; 
Cohen  et  al.,  2009). 
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Although  it  is  generally  accepted  that  complex  cognitive 
behaviors  such  as  planning  and  decision-making  require  that  a 
model  of  the  world  be  learned  and  maintained,  it  is  still  unclear 
what  regions  of  the  brain  govern  how  or  when  a  model  is  learned, 
or  which  regions  are  involved  in  maintaining  that  model.  We 
previously  noted  that  the  vector  error  signal  calculated  by  the  PRO 
model  is  consistent  with  a  state  prediction  error,  although  we  note 
that  the  PRO  model  is  not  itself  a  model-based  RL  algorithm  per 
se.  However,  it  does  suggest  that  activity  in  mPFC  maybe  used  as  a 
learning  signal  by  other  brain  regions  that  are  directly  involved  in 
model  maintenance  (Alexander  and  Brown,  201 1).  A  likely  candi¬ 
date  in  this  regard  is  dorsolateral  prefrontal  cortex  (PFC),  a  region 
implicated  in  working  memory  and  rule  representation  (Wallis 
et  al.,  2001;  Nee  and  Brown,  2013;  Mian  et  al.,  2014)  and  known  to 
project  reciprocally  to  mPFC  (Barbas  and  Pandya,  1989).  Another 
possible  substrate  of  model-based  prediction  is  the  hippocampus 
(van  der  Meer  and  Redish,  2010).  Future  work  should  investigate 
how  the  interaction  of  these  regions  may  contribute  to  model- 
based  RL. 

Our  results  show  that  the  essential  functions  of  the  PRO 
model,  namely  that  of  prediction  and  detection  of  discrepancy, 
can  account  for  a  range  of  results  primarily  related  to  processing 
stimulus-related  information.  This  suggests  a  role  for  mPFC  in 
processes  related  to  attention  or  attention-like  processes.  Previous 
associative  (Mackintosh,  1975;  Pearce  and  Hall,  1980),  connec- 
tionist  (Kruschke,  2001),  and  RL  models  (Alexander,  2007)  have 
exploited  prediction  errors  to  drive  attentional  learning.  One 
possible  role  of  the  mPFC  signal  may  therefore  involve  allocating 
attention  to  relevant  stimuli.  We  do  not  claim  that  the  mPFC 
is  the  only  brain  region  that  signals  prediction  error  though. 
There  is  evidence  that  other  regions  including  the  cerebellum 
may  also  signal  prediction  errors  (Blakemore  et  al.,  2001).  An 
important  question  raised  by  our  results  concerns  the  distinc¬ 
tion  between  the  functions  of  orbitofrontal  cortex  (OFC)  and 
mPFC.  It  has  previously  been  thought  that  these  two  regions  play 
complementary  roles  in  decision  making,  with  mPFC  encoding 
action  values  while  OFC  encodes  the  value  of  stimuli  (Gold¬ 
stein  et  al.,  2007;  Rudebeck  et  al.,  2008;  Camille  et  al.,  2011; 
Kennerley  et  al.,  2011).  The  extension  of  the  PRO  model  to 
include  prediction  of  events  in  general  (rather  than  solely  pre¬ 
dicting  the  consequences  of  actions)  blurs  this  otherwise  appeal¬ 
ing  distinction.  A  recent  computational  model  (Wilson  et  al., 
2014)  interprets  OFC  as  being  involved  in  state  representation, 
and  thus,  in  conjunction  with  the  PRO  model,  may  provide 
an  alternative  account  for  the  distinct,  complementary  roles  of 
the  two  regions  in  model-based  RL.  Specifically,  state  represen¬ 
tations  maintained  by  OFC  may  serve  as  the  basis  for  predic¬ 
tions  generated  within  mPFC,  while  prediction  errors  signaled 
by  mPFC  may  provide  information  relevant  to  determining  task 
state  to  OFC.  More  generally,  while  the  PRO  model  accounts 
for  a  range  of  data  observed  in  mPFC,  the  region  is  highly 
interconnected  with  additional  areas  of  the  brain  whose  function 
may  represent  variables  in  the  PRO  model  that  appear  to  be 
not  directly  related  to  mPFC  activity,  including  stimulus/state 
representation,  the  relative  value  of  immediate  options  (Boor¬ 
man  et  al.,  2013),  or  the  implementation  of  top-down  control. 
Our  results  organize  a  wide  range  of  data  on  the  mPFC  in 


an  expanded  theoretical  framework,  which  suggests  that  mPFC 
learns  to  predict  the  outcomes  of  salient  events  in  general,  and 
provide  critical  constraints  on  the  function  of  regions  with  which 
mPFC  interacts. 

A  potential  weakness  of  the  current  study  relates  to  the  depen¬ 
dence  of  the  PRO  model  on  consistent  inter-event  timing  in  order 
to  converge  on  predictions  reflecting  the  likelihood  of  observing 
an  event.  This  weakness  has  been  noted  in  other  reports  (Jahn 
et  al.,  2014),  and  is  due  to  the  formulation  of  the  model  based 
on  TD  learning  and  the  temporal  representation  of  stimuli  as  a 
tapped-delay  line.  The  manner  in  which  stimuli  are  represented 
through  time  by  the  brain,  and  how  that  representation  informs 
activity  in  mPFC,  is  likely  more  sophisticated  than  the  scheme 
implemented  in  the  PRO  model.  While  mPFC  is  known  to 
be  sensitive  to  violations  of  temporal  expectancies  (Yeung  and 
Nieuwenhuis,  2009;  Forster  and  Brown,  2011;  Grinband  et  al., 
2011),  it  is  generally  assumed  that  jittered  delay  intervals  do  not 
unduly  influence  BOLD  activity  related  to  underlying  cognitive 
processes,  and  the  use  of  consistent  inter-event  timing  in  our 
simulations  reflects  this  assumption.  However,  to  the  extent  that 
mPFC  activity  reflects  deviations  from  temporal  expectancies 
in  addition  to  effects  related  to  cognitive  processes,  it  may  be 
necessary  to  re-evaluate  our  current  interpretations  of  mPFC 
activity  in  the  context  of  a  more  realistic  model  of  temporal 
representation. 
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